Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 370 #414

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fix issue 370 #414

wants to merge 1 commit into from

Conversation

h33p
Copy link
Contributor

@h33p h33p commented Jul 2, 2024

Closes #370

The issue gets triggered, because qemu/virsh generated elf dump contains strtab at the end of the dump file. Meanwhile, dumps are large, and thus, it is infeasible to provide the entire dump file to the parser. This PR relaxes the requirements of the parser and returns default strtab, if the initial size check fails.

The issue gets triggered, because qemu/virsh generated elf dump contains strtab
at the end of the dump file. Meanwhile, dumps are large, and thus, it is
infeasible to provide the entire dump file to the parser. This PR relaxes the
requirements of the parser and returns default strtab, if the initial size
check fails.
Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure if this is the best approach to fixing this issue.
In general, if the file is huge, I'd expect it to be memmap'd and lazy paged in as user iterates/accesses parts. This should be sufficient to not crash parsing a huge coredump file.
Secondly, unless I've misunderstood, if the issue is that the strtab is out of bounds, this means that the file is malformed, unless only a chunk of the file is being loaded and passed to goblin; but goblin's parse was not really ever designed for that usecase.
In that case, probably the user should use lazy_parse , since strtab will get defaulted there.
Lastly, I'm also not sure if this really fixes any issue; won't subsequent lookups all fail due to an empty strtab, and so this is really just kicking the can down the road?

@h33p
Copy link
Contributor Author

h33p commented Jul 16, 2024

@m4b thank you for your reply!

Secondly, unless I've misunderstood, if the issue is that the strtab is out of bounds, this means that the file is malformed, unless only a chunk of the file is being loaded and passed to goblin;

Exactly. just as in #370, only the head chunk is passed over to goblin. The file itself is not malformed.

In that case, probably the user should use lazy_parse , since strtab will get defaulted there.

This probably is the right way to go about this. My usecase is only reading the program headers, nothing more.

This means, I can just use ProgramHeader::parse, however, it requires a scroll Ctx. Is there a neat way to access/build that without copying the logic in parse_misc?

@m4b
Copy link
Owner

m4b commented Jul 21, 2024

Ok, I see your issue. So, this is primarily because lazy_parse isn't really fleshed out in a consistent way. Iiuc, you would do:

let mut elf = Object::lazy_parse(bytes)?;
// how to populate program_headers?

So I think the best path forward here will be:

  1. exposing the ctx by making the field pub or a pub fn ctx() (I don't want to do this)
  2. adding some helper methods, especially for e.g., program_headers and section_headers, where it takes care of passing the various fiddly bits (i think this is ideal, but i'm not sure whether a builder-esque pattern, or mutating methods on the elf struct?

to flesh out 2. a bit, i mean something like:

pub fn parse_program_headers(&mut self, bytes: &[u8]) {
            let program_headers = ProgramHeader::parse(bytes, self.header.e_phoff as usize, self.header.e_phnum as usize, self.ctx)?;

            let mut interpreter = None;
            for ph in &program_headers {
                if ph.p_type == program_header::PT_INTERP && ph.p_filesz != 0 {
                    let count = (ph.p_filesz - 1) as usize;
                    let offset = ph.p_offset as usize;
                    interpreter = bytes.pread_with::<&str>(offset, ::scroll::ctx::StrCtx::Length(count)).ok();
                }
            }
      self.program_headers = program_headers;
      self.interpreter = interpreter;
}

Obviously this would forward to either private static methods that return the appropriate types, and so code duplication is reduced, or we use &mut self methods to construct/parse whichever components we want.

so there's a number of things I would change at this point, which is somewhat unrelated to your request to just parse the program headers:

  1. We should probably introduce a builder or a &mut self api
  2. lazy_parse should have taken some bytes and parsed the header, and defaulted the rest
  3. parse calls lazy_parse, and then uses the above builder api to construct the remaining structs

I know this isn't probably what you wanted to hear, and while I understand your patch may "fix" the issue for you, it isn't unfortunately the right solution to the problem, which is specifically, you want to lazy_parse, and then supply just e.g., the program_header bytes and parse from those, to get the information you need.

Fortunately, I don't think this is a particularly hard problem to fix in a better manner, as I sort of outlined above it's just slightly more work :) let me know if you have any questions, and thanks for your patience and explaining your usecase better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failing to parse QEMU memory dump note .shstrtab
2 participants